Improving visual features for lip-reading
نویسندگان
چکیده
Automatic speech recognition systems that utilise the visual modality of speech often are investigated within a speakerdependent or a multi-speaker paradigm. That is, during training the recogniser will have had prior exposure to example speech from each of the possible test speakers. In a previous paper we highlighted the danger of not using different speakers in the training and test sets, and demonstrated that, within a speakerindependent configuration, lip-reading performance degrades dramatically due to the speaker variability encoded in the visual features. In this paper, we examine feature improvement techniques to reduce speaker variability. We demonstrate that, by careful choice of technique, the effects of inter-speaker variability in the visual features can be reduced, which improves significantly the recognition accuracy of an automated lip-reading system. However, the performance of the lip-reading system still is significantly below that of acoustic speech recognition systems, and an analysis of the confusion matrices generated by the recogniser suggests this largely is due to the number of deletions apparent in a visual-only system.
منابع مشابه
Improving Lip-reading with Feature Spac Audio-Visual Speech R
In this paper we investigate feature space transforms to improve lip-reading performance for multi-stream HMM based audio-visual speech recognition (AVSR). The feature space transforms include non-linear Gaussianization transform and feature space maximum likelihood linear regression (fMLLR). We apply Gaussianization at the various stages of visual front-end. The results show that Gaussianizing...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کامللبخوانی: روش جدید احراز هویت در برنامههای کاربردی گوشیهای تلفن همراه اندروید
Today, mobile phones are one of the first instruments every individual person interacts with. There are lots of mobile applications used by people to achieve their goals. One of the most-used applications is mobile banks. Security in m-bank applications is very important, therefore modern methods of authentication is required. Most of m-bank applications use text passwords which can be stolen b...
متن کاملAn Efficient Lip-reading Method Using K-nearest Neighbor Algorithm
Many studies have been carried out on lip reading, most of those works are based on color images, while some essential features might not be obtained, like inner lip information. In this paper, RGBD camera will be introduced for improving the recognition rate of lip reading. We try to complete lip reading through using only gray-scale images. Thirteen groups of words are given, and we present e...
متن کاملDecoding visemes: improving machine lipreading (PhD thesis)
This thesis is about improving machine lip-reading, that is, the classification of speech from only visual cues of a speaker. Machine lip-reading is a niche research problem in both areas of speech processing and computer vision. Current challenges for machine lip-reading fall into two groups: the content of the video, such as the rate at which a person is speaking or; the parameters of the vid...
متن کامل